65 research outputs found

    Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database

    Get PDF
    The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S0) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky’s general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what’s missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data

    Anomalous salting-out, self-association and pKa effects in the practically-insoluble bromothymol blue

    Get PDF
    Background and Purpose The widely-used and practically insoluble diprotic acidic dye, bromothymol blue (BTB), is a neutral molecule in strongly acidic aqueous solutions. The Schill (1964) extensive solubility-pH measurement of bromothymol blue in 0.1 and 1.0 M NaCl solutions, with pH adjusted with HCl from 0.0 to 5.4, featured several unusual findings. The data suggest that the difference in solubility of the neutral-form molecule in 1M NaCl is more than 0.7 log unit lower than the solubility in pure water. This could be considered as uncharacteristically high for a salting-out effect. Also, the study reported two apparent values of pKa1, 1.48 and 1.00, in 0.1 M and 1.0 M NaCl solutions, respectively. The only other measured value found for pKa1 in the literature is -0.66 (Gupta and Cadwallader, 1968). Experimental Approach It was reasoned that the there can be only a single pKa1 for BTB. Also, it was hypothesized that salting-out alone might not account for such a large difference in solubility observed at the two levels of salt. A generalized mass action approach incorporating activity corrections for charged species using the Stokes-Robinson hydration equation and for neutral species using the Setschenow equation, was selected to analyze the Schill solubility-pH data to seek a rationalization of these unusual results. Key Results BTB reveals complex speciation chemistry in saturated aqueous solutions which had been poorly understood for many years. The appearance of two different values of pKa1 at different levels of NaCl and the anomalously high value of the empirical salting-out constant could be rationalized to normal values by invoking the formation of a very stable neutral dimer (log K2 = 10.0 ± 0.1 M-1). A ‘normal’ salting-out constant, 0.25 M-1 was then derived. It was also possible to estimate the ‘self-interaction’ constant. The data analysis in the present study critically depended on the pKa1 = -0.66 reported by Gupta and Cadwallader. Conclusion A more reasonable salting-out constant and a consistent single value for pKa1 have been determined by considering a self-interacting (aggregation) model involving an uncharged form of the molecule, which is likely a zwitterion, as suggested by literature spectrophotometric studies

    Multi-lab intrinsic solubility measurement reproducibility in CheqSol and shake-flask methods

    Get PDF
    This commentary compares 233 CheqSol intrinsic solubility values (log S0) reported in the Wiki-pS0 database for 145 different druglike molecules to the 838 log S0 values determined mostly by the saturation shake-flask (SSF) method for 124 of the molecules from the CheqSol set. The range of log S0 spans from -1.0 to -10.6 (log molar units), averaging at -3.8. The correlation plot between the two methods indicates r2 = 0.96, RMSE = 0.34 log unit, and a slight bias of -0.07 log unit. The average interlaboratory standard deviation (SDi) is slightly better for the CheqSol set than that of the SSF set: SDiCS = 0.15 and SDiSSF = 0.24. The intralaboratory errors reported in the CheqSol method (0.05 log) need to be multiplied by a factor of 3 to match the expected interlaboratory errors for the method. The scale factor, in part, relates to the hidden systematic errors in the single-lab values. It is expected that improved standardizations in the ‘gold standard’ SSF method, as suggested in the recent ‘white paper’ on solubility measurement methodology, should make the SDi of both methods be about ~0.15 log unit. The multi-lab averaged log S0 (and the corresponding SDi) values could be helpful additions to existing training-set molecules used to predict the intrinsic solubility of drugs and druglike molecules

    Do you know your r2?

    Get PDF
    The prediction of solubility of drugs usually calls on the use of several open-source/commercially-available computer programs in the various calculation steps. Popular statistics to indicate the strength of the prediction model include the coefficient of determination (r2), Pearson’s linear correlation coefficient (rPearson), and the root-mean-square error (RMSE), among many others. When a program calculates these statistics, slightly different definitions may be used. This commentary briefly reviews the definitions of three types of r2 and RMSE statistics (model validation, bias compensation, and Pearson) and how systematic errors due to shortcomings in solubility prediction models can be differently indicated by the choice of statistical indices. The indices we have employed in recently published papers on the prediction of solubility of druglike molecules were unclear, especially in cases of drugs from ‘beyond the Rule of 5’ chemical space, as simple prediction models showed distinctive ‘bias-tilt’ systematic type scatter

    Can small drugs predict the intrinsic aqueous solubility of ‘beyond Rule of 5’ big drugs?

    Get PDF
    The aim of the study was to explore to what extent small molecules (mostly from the Rule of 5 chemical space) can be used to predict the intrinsic aqueous solubility, S0, of big molecules from beyond the Rule of 5 (bRo5) space. It was demonstrated that the General Solubility Equation (GSE) and the Abraham Solvation Equation (ABSOLV) underpredict solubility in systematic but slightly ways. The Random Forest regression (RFR) method predicts solubility more accurately, albeit in the manner of a ‘black box.’ It was discovered that the GSE improves considerably in the case of big molecules when the coefficient of the log P term (octanol-water partition coefficient) in the equation is set to -0.4 instead of the traditional -1 value. The traditional GSE underpredicts solubility for molecules with experimental S0 < 50 ”M. In contrast, the ABSOLV equation (trained with small molecules) underpredicts the solubility of big molecules in all cases tested. It was found that the errors in the ABSOLV-predicted solubilities of big molecules correlate linearly with the number of rotatable bonds, which suggests that flexibility may be an important factor in differentiating solubility of small from big molecules. Notably, most of the 31 big molecules considered have negative enthalpy of solution: these big molecules become less soluble with increasing temperature, which is compatible with ‘molecular chameleon’ behavior associated with intramolecular hydrogen bonding. The X‑ray structures of many of these molecules reveal void spaces in their crystal lattices large enough to accommodate many water molecules when such solids are in contact with aqueous media. The water sorbed into crystals suspended in aqueous solution may enhance solubility by way of intra-lattice solute-water interactions involving the numerous H‑bond acceptors in the big molecules studied. A ‘Solubility Enhancement–Big Molecules’ index was defined, which embodies many of the above findings.</p

    Mechanistically transparent models for predicting aqueous soluÂŹbility of rigid, slightly flexible, and very flexible drugs (MW<2000) Accuracy near that of random forest regression Alex Avdeef

    Get PDF
    Yalkowsky’s General Solubility Equation (GSE), with its three fixed constants, is popular and easy to apply, but is not very accurate for polar, zwitterionic, or flexible molecules. This review examines the findings of a series of studies, where we have sought to come up with a better prediction model, by comparing the performances of the GSE to Abraham’s Solvation Equation (ABSOLV), and Random Forest regression (RFR) machine-learning (ML) method. Large, well-curated aqueous intrinsic solubility databases are available. However, drugs may be sparsely distributed in chemical space, concentrated in clusters. Even a large database might overlook some regions. Test compounds from under-represented portions of space may be poorly predicted, as might be the case with the ‘loose’ set of 32 drugs in the Second Solubility Challenge (2020). There appears to be still a need for better coverage of drug space. Increasingly, current trends in predictions of solubility use calculated input descriptors, which may be an advantage for exploring properties of molecules yet to be synthesized. The risk may be that overall prediction approaches might be based on accumulated uncertainty. The increasing use of ML/AI methods can lead to accurate predictions, but such predictions may not readily suggest the strategies to pursue in selecting yet-to-be-synthesized compounds. Based on our latest findings, we recommend predictions based on both ‘grouped’ ABSOLV(GRP) and ‘Flexible Acceptor’ GSE(Ω,B) models with the provided best-fit parameters, where Ω is the Kier molecular flexibility index and B is the Abraham H-bond acceptor strength. For molecules with Ω < 11, the prudent choice is to pick the Consensus Model, the average of ABSOLV(GRP) and GSE(Ω,B). For more flexible molecules, GSE(Ω,B) is recommended

    Anomalous Solubility Behavior of Several Acidic Drugs

    Get PDF
    The “anomalous solubility behavior at higher pH values” of several acidic drugs originally studied by Higuchi et al. in 1953 [1], but hitherto not fully rationalized, has been re-analyzed using a novel solubility-pH analysis computer program, pDISOL-XTM. The program internally derives implicit solubility equations, given a set of proposed equilibria and constants (iteratively refined by weighted nonlinear regression), and does not require explicit Henderson-Hasselbalch equations. The re-analyzed original barbital, phenobarbital, oxytetracycline, and sulfathiazole solubility-pH data of Higuchi et al. is consistent with the presence of dimers in saturated solutions. In the case of barbital, phenobarbital and sulfathiazole, anionic dimers, reaching peak concentrations near pH 8. However, oxytetracycline indicated a pronounced tendency to form a cationic dimer, peaking near pH 2. Under the conditions of the original study, only barbital indicated a slight tendency to form a salt precipitate at pH > 6.8, with a highly unusual stoichiometry (consistent with a slope of 0.55 in the log S – pH plot): K+ + A2H- + 3HA KA5H4(s). Thus the “anomaly” in the Higuchi data can be rationalized by invoking specific aggregated species

    Anomalous salting-out, self-association and pKa effects in the practically-insoluble bromothymol blue

    Get PDF
    Background and Purpose: The widely-used and practically insoluble diprotic acidic dye, bromothymol blue (BTB), is a neutral molecule in strongly acidic aqueous solutions. The Schill (1964) extensive solubility-pH measurement of bromothymol blue in 0.1 and 1.0 M NaCl solutions, with pH adjusted with HCl from 0.0 to 5.4, featured several unusual findings. The data suggest that the difference in solubility of the neutral-form molecule in 1M NaCl is more than 0.7 log unit lower than the solubility in pure water. This could be considered as uncharacteristically high for a salting-out effect. Also, the study reported two apparent values of pKa1, 1.48 and 1.00, in 0.1 M and 1.0 M NaCl solutions, respectively. The only other measured value found for pKa1 in the literature is -0.66 (Gupta and Cadwallader, 1968). Experimental Approach: It was reasoned that the there can be only a single pKa1 for BTB.  Also, it was hypothesized that salting-out alone might not account for such a large difference in solubility observed at  the two levels of salt. A generalized mass action approach incorporating activity corrections for charged species using the Stokes-Robinson hydration equation and for neutral species using the Setschenow equation, was selected to analyze the Schill solubility-pH data to seek a rationalization of these unusual results. Key Results: BTB reveals complex speciation chemistry in saturated aqueous solutions which had been poorly understood for many years. The appear­ance of two different values of pKa1 at different levels of NaCl and the anomalously high value of the empirical salting-out constant could be rationalized to normal values by invoking the formation of a very stable neutral dimer (log K2 = 10.0 ± 0.1 M-1).  A ‘normal’ salting-out constant, 0.25 M-1 was then derived. It was also possible to estimate the ‘self-interaction’ constant.  The data analysis in the present study critically depended on the pKa1 = -0.66 reported by Gupta and Cadwallader. Conclusion: A more reasonable salting-out constant and a consistent single value for pKa1 have been determined by considering a self-interacting (aggregation) model involving an uncharged form of the molecule, which is likely a zwitterion, as suggested by literature spectrophotometric studies

    Phosphate Precipitates and Water-Soluble Aggregates in Re analyzed Solubility-pH Data of Twenty-five Basic Drugs

    Get PDF
    The purpose of the study was to assess the stoichiometries of phosphate precipitates and determine the intrinsic solubilities, S0, of 25 basic drugs from their published solubility-pH profiles in the landmark study of Bergström et al. (2004), where 0.15 M phosphate buffer media had been used. A secondary purpose of this study was to attempt to predict phosphate 1:1 and 2:1 solubility products, Ksp, from knowledge of S0. The published data have been re-analyzed using a novel solubility-pH analysis computer program, pDISOL XTM. The program internally derives implicit solubility equations, given a set of proposed equilibria and constants (which are then iteratively refined by weighted nonlinear regression), and does not require explicit Henderson-Hasselbalch equations. The data were tested for the presence of phosphate precipitates of various stoichiometries, as well as the simultaneous presence of aggregated species, either cationic or neutral. The presence of particular species was suggested by the slope characteristics of the log S vs. pH curves. Considerably different intrinsic solubility constants were found, compared to those originally reported, for several drugs (e.g., celiprolol, desipramine, haloperidol). The least soluble molecule, amiodarone, analyzed to have the extraordinarily low intrinsic solubility of 2 picograms/mL, a moderate salt solubility of 0.82 mg/mL at the Gibbs pKa 5.4, corresponding to the species BH∙H2PO4(s), and a substantial presence of the positively-charged pentameric aggregate, (BH)5

    Thickness of the aqueous boundary layer in stirred microtitre plate permeability assays (PAMPA and Caco-2), based on the Levich equation

    Get PDF
    The stirring frequency exponent of -1/2 in the theoretical Levich expression appears to apply to PAMPA and Caco-2 assays, where efficient individual-well magnetic stirring ( > 20 RPM) is used.  If a single molecule is used as a stirring calibrant, then scaling suggested by Eq. (5) with microtitre plate data may be used.  The error in calculating hABL based on unscaled hABLref can be as high as 30%.  This is of practical importance in PAMPA, and perhaps cellular assays as well
    • 

    corecore